Introduction to R

A language about more than statistics

Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)

Department of Health and Environmental Sciences

Xi’an Jiaotong-Liverpool University

1 Objectives

  • Know what R can do
  • Set up the R/RStudio environment
  • Understand the way how R works
  • Basic operations in R

2 Installation

  • Online compiler

  • Main program (Mandatory): R

  • Integrated Development Environment (Highly recommended): RStudio

  • R Packages

install.packages(c("beginr", "ggplot2", "GGally", "ggplotgui", "learnr", "mindr", "MSG", "pinyin", "Rcmdr", "plotly", "remotes", "swirl"))
remotes::install_github("pzhaonet/fecitr")

3 What is R

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in popularity; as of April 2021, R ranks 16th in the TIOBE index, a measure of popularity of programming languages.

— Wikipedia: R (programming language)

R is far more. It is a way you communicate with your computer.

4 What can R do

library(beginr)
plotcolors()

library(pinyin)
py('西交利物浦大学', dic = pydic())

library(mindr)
mm(c('# Pros', '# Cons'), root = 'R language')

5 Basic operation

5.1 Hotkeys

  • Ctrl + Enter
  • TAB
  • F1

5.2 Demo data

write.csv(iris, 'dat.csv', row.names = FALSE)

5.3 Import data

dat <- read.csv('dat.csv')

5.4 Statistics/Calculation

# mean and standard deviation
mean(dat$Sepal.Length)
sd(dat$Sepal.Length)

# more statistics
summary(dat)

# groups
tapply(dat$Sepal.Length, dat$Species, mean)
tapply(dat$Sepal.Length, dat$Species, sd)

# analysis of variance
xx <- aov(dat$Sepal.Length ~ dat$Species)
summary(xx)

# regression
mylm <- lm(dat$Petal.Width ~ dat$Petal.Length)
summary(mylm)

5.5 Graphs

plot(x = dat$Petal.Length, 
     y = dat$Petal.Width)
abline(mylm)

5.6 Packages

summary(dat)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
library(fecitr)
plot_summary(dat, base = "hist", if_box = TRUE)

library(ggplot2)
ggplot(dat, aes(Petal.Length,Petal.Width))+ 
  geom_point() + 
  geom_smooth(method = "lm")

library(GGally)
ggpairs(dat, aes(color = Species, alpha = 0.1))
library(plotly)
ggpairs(dat, aes(color = Species, alpha = 0.1)) |> 
  ggplotly()

5.7 Export data

dat$new <- dat$Sepal.Length - mean(dat$Sepal.Length)
write.csv(dat, "dat2.csv")

5.8 GUI

library(Rcmdr)

library(ggplotgui)
ggplot_shiny()

6 Pros & Cons

Software Difficulty Type Cost Usage Support Best for
Excel Easy GUI Cheap Wide Widespread Graphs
R Difficult Code Free Increasing Strongly online Cutting edge
SPSS Medium GUI Expensive Social Sci. Manual Statistics
SAS Difficult Code Expensive Decreasing Manual Complex

7 Move forward

7.1 Partners

7.2 Help documents

demo(graphics)
demo(persp)
demo(image)
demo(plotmath)
demo(nlm)
demo(lm.glm)
demo(smooth)

# ggplot2
example(qplot)

# GGally
example(ggpairs)

# MSG
library(MSG)
demo(basketball)
demo(pointArts)
demo(gradArrows1) # Gradient descent method

7.3 R packages

library(swirl)

library(learnr)
run_tutorial("ex-data-basics", "learnr")

7.4 Books

7.5 Search engine

7.6 Forums